Perception and visualization. Preprocessing.

Lecture 2

732A98

Human perception

  • How are visualizations perceived by different humans?
  • How do we know that a given visualization is correctly interpreted?

Perception:

  • Recognizing
  • Organizing (gathering, storing)
  • Interpreting (binding to knowledge)

Illusions

  • Human perceptual system is not perfect

Perception mechanism

  • Preattentive
    • Fast (250 ms)
    • Performed in parallel
  • Attentive
    • Slow
    • Uses short term memory
    • Transforms simple visual features into structured objects
    • Compares to memory models (ex. door)

Preattentive processing

  • Preattentive feature= shape
  • How quickly do you see a red circle?

Preattentive processing

  • Important: Combination (conjuction) of nonunique features can not be detected preattentively
    • Many red objects
    • Many circle objects
  • How quickly can you find a unique object here?

Preattentive features

  • Length
  • Width
  • Size
  • Curvature (shape)
  • Hue
  • Intensity
  • Flicker
  • Direction of motion
  • 3D depth
  • Lighting direction

Preattentive visual tasks

  • Presense or absense of object with a unique visual feature among distractors is detected preattentively
  • Boundary between two groups of elements with the same visual feature is detected preattentively
  • Movement of an object with a unique visual feature is tracked preattentively
  • Amount of elements with a unique visual feature is estimated preattentively

Treisman's theory of preattentive processing

  • A figure is processed in parallel by checking individual feature maps
  • A specific preattentive task is performed in each feature map
  • Conjuction of features requires searial search between maps - takes time

Treisman's theory of preattentive processing

  • How quickly can you identify a boundary?

Metrics

  • What graphical features can be accurately perceived by humans?
  • How many distinct entities can be visualized without confusion?
  • How should we use color?
  • How should we combine features in a complex phenomenon?

Channel capacity: how many different levels of a feature we can perceive

  • 8 levels = 3 bits

Metrics

  • Position on a line: 10-15 levels (3.25 bits)
  • Size of squares: 4-5 levels (2.2 bits)
  • Color: hue 10 levels, brightness: 5 levels (3.1 bits, 2.1 bits)
  • Line length: 2.8 bits
  • Line orientation: 3 bits
  • Line curvature: 1.6-2.2 bits

Summary: 6-7 unique values max.

Metrics

Note: Combining metrics does not sum up the capacity!…

  • Hue and saturation: 3.6 bits
  • Size, brightness and hue: 4.1 bits
  • Position in a square: 4.6 bits

Metrics

Relative judgement: comparing two values of a feature

Errors (in increasing order)

  • Position along a common scale
  • Length
  • Angle
  • Area
  • Volume
  • Color hue


–> Pie charts are less effective than Bar Charts

Principles of good visualization

  • Use intuitive mapping to aesthetics
    • Visualization type is adopted to user's background
    • Geographical coordinates –> X,Y, temperature–>color
    • Use correct mapping
      • Ordinal variables- X,Y, saturation, orientation
      • Nominal variables - shape, texture, hue
  • Support view modifications
    • Scrolling, zooming
    • Color map
    • Mapping aesthetics
    • Scales
    • Level of details

Principles of good visualization

  • Do not put too much information in the display (occlusion)
  • Add keys, labels, legends, grids with care
  • Use display efficiently (0%-100% scale vs actual domain)

Principles of good visualization

Color:

  • Keep the number of colors low (5-6 distinct)
  • Use redundant mappings (color+size)
  • Include labeled color key
  • Use resonant colors

Principles of good visualization

Aesthetics:

  • Important findings should be visually emphasized
  • Most important components in the center
  • Do not put much information into one display

Other:

  • The size of the plot should be normally Horizontal:Vertical=1.5:1
  • Text in the graph is normally horizontal
  • Caption and Source should be present and informative
  • In bar charts, bars are normally sorted
  • Axis labels present

Misleading graphs

  • Scaling and perspective problem

Misleading graphs

  • Scaling and perspective problem

Misleading graphs

Abusing dimensionality/wrong mapping

  • A scalar is mapped to a size of a cube
  • Mapping is wrong: a scalar is mapped to radius, not area
    • R1=2R2, A1=4A2 !

Misleading graphs

  • Mixing data of different nature/scales
    • Ex: One time series plots with two series: Price and Amount both on Y axis
  • Smoothed/filtered data interpreted as raw data
    • How good was the smoothing?
  • Using of insufficient sampled data

Basic plots

  • Quantiative variable:
    1. Computing summaries (ex. frequencies)
    2. Visualizing as bar or pie charts
  • What to analyse:
    • Largest and smallest bar or slice
    • For sorted bars, sudden shifts in level
    • Compare first within groups and then difference between groups

Basic plots

Example: Visualizing number of gears and number of cylinders in cars

34567890510
CylindersCount
4680510
345CylindersCount

Basic plots

43.8%34.4%21.9%
846

Visualization pipeline

  • Dimension reduction
    • PCA
    • MDS
    • Correspondence analysis (nominal)
    • Other techniques (ex. ICA, Autoencoders), welcome to Machine Learning course..

Principal Component Analysis (PCA)

Distance between objects

  • Meaning of "two objects are close"?
  • Measure of proximity (ex: quantiative vars, Euclidian distance)

  • Similarity measure srs (=1 if same object, <1 otherwise)
    • Ex: correlation
  • Dissimilarity measure δrs (=0 if same object, >0 otherwise)
    • Ex: euclidian distance
  • Problem of cosntructing the measures of proximity:
    • What if the variable is qualitative?
    • What if the object is a text document?

Multidimensional scaling (MDS)

Given n objects with known matrix of similarities or dissimilarities. Each object i is characterized by p-dimensional vector Xi

The aim:
  • Present these objects in lower dimensions (p′=2 or 3) such that the distance between the new points drs would reflect the matrix of similarities (or dissimilarities δrs)
  • See neighbour observations
  • See clusters and outliers
  • Have a "map" of your data

MDS

Two types of MDS:

  • Metric MDS
  • Non-metric MDS
Metric MDS

(algorithm is not discussed here)

Seaching for points χ1,…,χn, such that distances between ||δrs|| and ||drs|| are minimized

Non-metric MDS

Given n objects X1,…,Xn with known matrix of similarities ||δrs|| of dissimilarities.

For some configuration χ1,…,χn (in lower dimension) with matrix ||drs|| , define stress S(χ1,…,χn) by

  1. Computing drs′ as a a monotonic regression of ||drs|| on ||δrs||
  2. Computing S=∑r,s(drs−drs′)2)∑r,sdrs2

How to find optimal configuration?

  • Use numeric optimization to minimize S(χ1,…,χn)

MDS- examples

Music data
  • Artist (abba. Beatles. Wiwaldi, Mozart, Beethoven, Enya)
  • Type (rock, classical, new wave)
  • lvar, lave, lmax, lfener, lfreq - parameters of the music signal

Metric MDS

  • Mapping into two dimensions and using scatterplot
−3−2−1012−2−101234
V1V2

Non-metric MDS

  • Mapping into three dimensions, coloring by Artist and using 3D-scatter:
AbbaBeatlesBeethovenEelsEnyaMozartVivaldi

Shephard plot

  • Plot of drs vs δrs
  • Displays also δrs′ for non-metric MDS
  • Shows the quality of MDS fit-> Best if scatter reminds a monotonic curve
024602468
trace 0trace 1deltaD

Read at home

  • Book, chapters 3.1, 3.3, 3.5, 13
  • Cox, AA, and Cox, T.F.: "Multidimensional scaling." Handbook of data visualization. Springer, Berlin, Heidelberg, 2008. 315-347.
  • Plotly book, ch 2.3